{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8cc1eec0",
   "metadata": {},
   "source": [
    "OpenMLDB sdk init: connect to cluster, and register for sql magic\n",
    "Plz do `/work/init.sh` to create the OpenMLDB cluster, and `python3 /work/talkingdata/predict_server.py --no-init > predict.log 2>&1 &` to start the simple predict server(receive the deployed sql and model, request it to do online feature extraction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7dbf87e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openmldb\n",
    "db = openmldb.dbapi.connect(database='demo_db',zk='127.0.0.1:2181',zkPath='/openmldb')\n",
    "openmldb.sql_magic.register(db)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1829fe58",
   "metadata": {},
   "source": [
    "The database and table name, which will be used later"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04bc2d08",
   "metadata": {},
   "outputs": [],
   "source": [
    "db_name=\"demo_db\"\n",
    "table_name=\"talkingdata\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "691bbd5b",
   "metadata": {},
   "source": [
    "You can use variables like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1221a2fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "var='1'\n",
    "%sql SELECT {var};\n",
    "%sql SELECT $var;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "513d4aa6",
   "metadata": {},
   "source": [
    "Create database and table(talking data schema)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f9e500a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql create database if not exists $db_name;\n",
    "%sql use $db_name;\n",
    "%sql create table if not exists $table_name (ip int, app int, device int, os int, channel int, click_time timestamp, is_attributed int, click_id int, hour int, day int);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9add0289",
   "metadata": {},
   "source": [
    "Offline load data and extract feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8c4e708",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql set @@execute_mode='offline';\n",
    "%sql set @@sync_job=true;\n",
    "%sql set @@job_timeout=600000;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "672b3e84",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql load data infile 'file:///work/talkingdata/train_sample.csv' into table $table_name options(mode='overwrite');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1185ab81",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_part = f\"\"\"\n",
    "select is_attributed, app, device, os, channel, hour(click_time) as hour, day(click_time) as day, \n",
    "count(channel) over w1 as qty, \n",
    "count(channel) over w2 as ip_app_count, \n",
    "count(channel) over w3 as ip_app_os_count  \n",
    "from {table_name} \n",
    "window \n",
    "w1 as (partition by ip order by click_time ROWS_RANGE BETWEEN 1h PRECEDING AND CURRENT ROW), \n",
    "w2 as(partition by ip, app order by click_time ROWS_RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),\n",
    "w3 as(partition by ip, app, os order by click_time ROWS_RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)\n",
    "\"\"\"\n",
    "\n",
    "train_feature_dir='/tmp/train_feature'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee0cb1cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql {sql_part} INTO OUTFILE '{train_feature_dir}' OPTIONS(mode='overwrite');"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "078bb8fb",
   "metadata": {},
   "source": [
    "Train: we use a simple train script to do it, and save the model to 'model_path'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1bc87b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import xgboost_train_sample\n",
    "model_path='/tmp/model.json'\n",
    "xgboost_train_sample.train(f'{train_feature_dir}/*.csv', model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f47ed227",
   "metadata": {},
   "source": [
    "Deploy sql & model, and load data in online mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95e54d38",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql SET @@execute_mode='online';"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af8f5b18",
   "metadata": {},
   "outputs": [],
   "source": [
    "deploy_name='d1'\n",
    "%sql DEPLOY $deploy_name $sql_part;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ccf50448",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql load data infile 'file:///work/talkingdata/train_sample.csv' into table $table_name options(mode='append');"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad607b01",
   "metadata": {},
   "source": [
    "Let the predict server know the sql and model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fab5dbc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "predict_server='localhost:8881'\n",
    "infos = {'database': db_name, 'deployment': deploy_name, 'model_path': model_path}\n",
    "res = requests.post('http://' + predict_server + '/update', json=infos)\n",
    "res.text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96323097",
   "metadata": {},
   "source": [
    "Then you can request the predict server to test\n",
    "`python3 predict.py`"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
